A Symbolic Corpus-based Approach to Detect and Solve the Ambiguity of Discourse Markers

Authors: Iria da Cunha

Research in Computing Science, Vol. 70, pp. 93-104, 2013.

Abstract: At present, discourse parsing is an important research topic. Rhetorical Structure Theory (RST) is one of the most popular approaches in this field. In general, discourse parsing includes three stages: discourse segmentation, discourse relations detection and building up rhetorical trees. Different strategies are used when developing discourse parsers. One of the strategies to detect discourse relations is based on symbolic rules that take into account linguistic clues, such as discourse markers. Nevertheless, some discourse markers are ambiguous, that is, they can indicate more than one discourse relation. This fact constitutes a problem when assigning discourse relations automatically. In this paper, a symbolic approach to detect and solve discourse markers ambiguity in Spanish is developed. First, we detect ambiguous discourse markers, using the training corpus of the RST Spanish Treebank. Second, we extract linguistic contexts for these markers. Third, we design linguistic rules to solve the ambiguity of discourse markers. Fourth, we evaluate the rules, using the test corpus of the RST Spanish Treebank. Our approach outperforms the baseline created following the methodology of the state of the art. Therefore, we consider that the results obtained in our experiments are representative and constitute the first step towards the disambiguation of discourse markers senses in Spanish. However, there is room for improvement and the main limitations of the approach are presented. In the future, the rules will be integrated in a discourse parser for Spanish, and several related applications will be developed (automatic summarization and information extraction, among others).

Keywords: Discourse Parsing, Discourse Markers, Ambiguity, Corpus, Rhetorical Structure Theory, Spanish

PDF: A Symbolic Corpus-based Approach to Detect and Solve the Ambiguity of Discourse Markers
PDF: A Symbolic Corpus-based Approach to Detect and Solve the Ambiguity of Discourse Markers